#loading packages
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.5 ✓ stringr 1.4.0
## ✓ tidyr 1.1.2 ✓ forcats 0.5.0
## ✓ readr 1.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x lubridate::intersect() masks base::intersect()
## x dplyr::lag() masks stats::lag()
## x lubridate::setdiff() masks base::setdiff()
## x lubridate::union() masks base::union()
library(ggridges) # for joy plots
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(gganimate) # for adding animation layers to ggplots
library(gifski) # for creating the gif (don't need to load this library every time,but need it installed)
#loading data
spotify <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## track_id = col_character(),
## track_name = col_character(),
## track_artist = col_character(),
## track_album_id = col_character(),
## track_album_name = col_character(),
## track_album_release_date = col_character(),
## playlist_name = col_character(),
## playlist_id = col_character(),
## playlist_genre = col_character(),
## playlist_subgenre = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
spotify_rap <- spotify %>%
filter(playlist_genre == "rap")
randb <- spotify %>%
filter(playlist_genre == "r&b") %>%
select(-track_id, - track_album_id, -playlist_id, -playlist_name) %>%
filter(track_popularity >= 75)
Why did we do an analysis on spotify? Why is the data significant & why should people care? Introduce the data to audience
Using this dataset, we hope to study to technicalities of music anbd
Aside from personal interest…
Data retrieved from github, (add link). https://github.com/rfordatascience/tidytuesday/blob/faca0b6bd282998693007c329e3f4b917a5fd7a8/data/2020/2020-01-21/readme.md Who collected the data and what prupose does it serve? Who funded the data collection? Any possible biases? What are teh implications of the analysis of this dataset, ethical or otherwise?
genre_pop <- spotify %>%
filter(track_popularity >= 75) %>%
mutate(ymd_release = ymd(track_album_release_date),
year = year(ymd_release)) %>%
group_by(year, playlist_genre) %>%
summarize(avg_popularity = mean(track_popularity)) %>%
ggplot(aes(x = year, y = avg_popularity, color = playlist_genre)) +
geom_point() +
labs(title="Average song popularity by genre per year",
subtitle = "Overall, as music becomes more accessible, average peopulatity across all genres is on the rise.",
x = "",
y = "",
color = "Genre") +
theme_classic()
## Warning: Problem with `mutate()` input `ymd_release`.
## ℹ 68 failed to parse.
## ℹ Input `ymd_release` is `ymd(track_album_release_date)`.
## Warning: 68 failed to parse.
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
ggplotly(genre_pop)
prelim_graph <- spotify %>%
ggplot(aes(y = playlist_genre, x = track_popularity)) +
labs(title = "Song Popularity by Genre",
x = "", y = "",
subtitle = "Song popularity is measured from 0-100, with higher numbers being indiciative of more popularity.\nHighest median popularities belong to pop and latin with an overall median popularity of 40",
caption = "Alex Ismail, Malek Kaloti, Brian Lee") +
theme_classic() +
theme(plot.title.position = "plot",
plot.title = element_text(size = 20, face = "bold"),
plot.subtitle = element_text(size = 10, face = "italic")) +
geom_boxplot() +
geom_vline(aes(xintercept = median(track_popularity, na.rm = TRUE)), color = "blue")
prelim_graph
Rap is a particularly fascinating genre to investigate using the Spotify data to look at what traits of music have correlated with popularity as the genre has undergone several changes in audience and style. Though a relatively new genre arriving on the greater music scene in the 80s, rap has undergone a myriad of trends and style variations. Fans of old school rap from the 80s and 90s may have distaste for today’s artists like Drake and Eminem for having modernized the genre too much. Fans of modern rap may get bored of the authentic sound of artists like Run-DMC or Tupac. Are there trends that tie all of rap together as to what makes a song popular?
The first and most natural observations to make are on overarching metrics that Spotify provides. Using the descriptions provided, I was most interested on the following values in correlation to track popularity: Danceability due to rap’s heavy emphasis on rhythm and beats, Energy due to some artists’ signature style of shouting to “hype” up a crowd (ie. Lil Jon, DMX), the inverse variables of Speechiness/Instrumentalness due to other artist’s signature of rapping as fast as possible (ie. Eminem, Busta Rhymes), and Valence for the perceived association between rap and violence, drugs, and focus on other less-than-righteous topics.
## `summarise()` regrouping output by 'Stat1' (override with `.groups` argument)
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Energy, Rounded_Speechiness, Rounded_Instrumental, Rounded_Valence
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 1: Stat1 = "Rounded_Danceability".
## Warning: Unknown levels in `f`: Rounded_Energy, Rounded_Speechiness,
## Rounded_Instrumental, Rounded_Valence
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Danceability, Rounded_Speechiness, Rounded_Instrumental, Rounded_Valence
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 2: Stat1 = "Rounded_Energy".
## Warning: Unknown levels in `f`: Rounded_Danceability, Rounded_Speechiness,
## Rounded_Instrumental, Rounded_Valence
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Danceability, Rounded_Energy, Rounded_Speechiness, Rounded_Valence
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 3: Stat1 = "Rounded_Instrumental".
## Warning: Unknown levels in `f`: Rounded_Danceability, Rounded_Energy,
## Rounded_Speechiness, Rounded_Valence
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Danceability, Rounded_Energy, Rounded_Instrumental, Rounded_Valence
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 4: Stat1 = "Rounded_Speechiness".
## Warning: Unknown levels in `f`: Rounded_Danceability, Rounded_Energy,
## Rounded_Instrumental, Rounded_Valence
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Danceability, Rounded_Energy, Rounded_Speechiness, Rounded_Instrumental
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 5: Stat1 = "Rounded_Valence".
## Warning: Unknown levels in `f`: Rounded_Danceability, Rounded_Energy,
## Rounded_Speechiness, Rounded_Instrumental
Oddly, the biggest conclusion I drew from this graph was not any positive or negative correlation, but a lack of connection between valence and popularity. For a genre that has a reputation for being connected with gangs, guns, drugs, etc., there is a complete lack of correlation between valence and popularity. Beyond that, there is a moderately strong correlation between popularity and danceability, as I had expected based on the prevalence of beats and rhythms in rap. The energy line shows that the highest percentage of songs to become popular are ~.5 energy, which likely suggests too much energy can take away from the popularity of a song. Finally, the speechiness/instrumentalness variable shows that songs on the extreme end of speechiness (.8+) are most likely to be popular.
Beyond the stats, there was one more observation I wanted to make on rap music. Based on my experience listening to rap, some of my favorite songs are remixes, features, or any other way multiple artists can put verses on the same song. Songs like “Life is Good” by Drake and Future, or remixes to songs like “HIGHEST IN THE ROOM” which incorporates Lil Baby in a song by Travis Scott add a certain level of freshness and break up three consecutive minutes of one artist rapping into fun back and forths with styles. Below is a graphic comparing the popularity rates of those songs vs solo songs between rap and other genres.
Multiple_Artist_Graph <- spotify %>%
mutate(track_name_lower = str_to_lower(track_name),
remix = str_detect(track_name_lower, "Remix"),
feature = str_detect(track_name_lower, "feat"),
ma_prep = remix|feature,
ma_prep2 = replace_na(ma_prep, FALSE),
multiple_artists = if_else(ma_prep2, true = "Multiple Artists", false = "One Artist"),
popular = track_popularity > 75) %>%
group_by(multiple_artists, playlist_genre) %>%
summarize(prop_pop = mean(popular)*100) %>%
mutate(genre = fct_relevel(playlist_genre, "rap")) %>%
ggplot() +
geom_col(aes(x = multiple_artists, y = prop_pop), fill = "black") +
facet_wrap(~genre) +
labs(title = "Popularity of Songs Containing Mulitple Artists Across Genre",
x = "", y = "Percent of Songs Popular") +
theme_classic() +
theme(plot.title.position = "plot",
plot.title = element_text(size = 20, face = "bold"),
plot.subtitle = element_text(size = 10, face = "italic"))
## `summarise()` regrouping output by 'multiple_artists' (override with `.groups` argument)
ggplotly(Multiple_Artist_Graph)
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Rap had the largest change between songs with multiple artists and song with only one, with a gap of 10%. Based on other genres’ disparity like R&B and pop, I hypothesize that these trends are connected with the modern day music scene. Collaboration between artists at the top of their field has become more commonplace, even with some mega-tracks like “Forever” with verses from Eminem, Kanye West, Drake, and Lil Wayne all in the same song. These collaborations can create songs with blended styles, which even furthers the development of rap as a unique genre.
In this section, I want to take a closer look at one of my favorite genres of music, R&B. I think I love it so much because it’s often good music to unwind to – it’s smooth, slow, and relaxing. I also love its versatility! R&B can fit the mood of anything from a gloomy, rainy day to a bright, sunny day. But why? What characteristics make R&B such a great genre to listen to? Using the Spotify dataset and some visualizations which look at the specific characteristics of the most popular R&B songs (songs with a popularity rating of above 75), I hope to come closer to answering these questions.
randb %>%
select(track_name, track_artist, playlist_genre, playlist_subgenre, track_popularity, danceability, energy, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, duration_ms) %>%
arrange(desc(track_popularity)) %>%
head(12) %>%
knitr::kable()
| track_name | track_artist | playlist_genre | playlist_subgenre | track_popularity | danceability | energy | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | duration_ms |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ROXANNE | Arizona Zervas | r&b | urban contemporary | 99 | 0.621 | 0.601 | -5.616 | 0 | 0.1480 | 0.05220 | 0.000000 | 0.4600 | 0.457 | 163636 |
| ROXANNE | Arizona Zervas | r&b | hip pop | 99 | 0.621 | 0.601 | -5.616 | 0 | 0.1480 | 0.05220 | 0.000000 | 0.4600 | 0.457 | 163636 |
| The Box | Roddy Ricch | r&b | urban contemporary | 98 | 0.896 | 0.586 | -6.687 | 0 | 0.0559 | 0.10400 | 0.000000 | 0.7900 | 0.642 | 196653 |
| Memories | Maroon 5 | r&b | urban contemporary | 98 | 0.764 | 0.320 | -7.209 | 1 | 0.0546 | 0.83700 | 0.000000 | 0.0822 | 0.575 | 189486 |
| Blinding Lights | The Weeknd | r&b | urban contemporary | 98 | 0.513 | 0.796 | -4.075 | 1 | 0.0629 | 0.00147 | 0.000209 | 0.0938 | 0.345 | 201573 |
| Blinding Lights | The Weeknd | r&b | hip pop | 98 | 0.513 | 0.796 | -4.075 | 1 | 0.0629 | 0.00147 | 0.000209 | 0.0938 | 0.345 | 201573 |
| The Box | Roddy Ricch | r&b | hip pop | 98 | 0.896 | 0.586 | -6.687 | 0 | 0.0559 | 0.10400 | 0.000000 | 0.7900 | 0.642 | 196653 |
| Tusa | KAROL G | r&b | hip pop | 98 | 0.803 | 0.715 | -3.280 | 1 | 0.2980 | 0.29500 | 0.000134 | 0.0574 | 0.574 | 200960 |
| Memories | Maroon 5 | r&b | hip pop | 98 | 0.764 | 0.320 | -7.209 | 1 | 0.0546 | 0.83700 | 0.000000 | 0.0822 | 0.575 | 189486 |
| Circles | Post Malone | r&b | hip pop | 98 | 0.695 | 0.762 | -3.497 | 1 | 0.0395 | 0.19200 | 0.002440 | 0.0863 | 0.553 | 215280 |
| Don’t Start Now | Dua Lipa | r&b | urban contemporary | 97 | 0.794 | 0.793 | -4.521 | 0 | 0.0842 | 0.01250 | 0.000000 | 0.0952 | 0.677 | 183290 |
| everything i wanted | Billie Eilish | r&b | urban contemporary | 97 | 0.704 | 0.225 | -14.454 | 0 | 0.0994 | 0.90200 | 0.657000 | 0.1060 | 0.243 | 245426 |
Above are the top 10 most popular songs in the R&B genre (12 songs were pulled from the dataset to account for 2 songs that were each in 2 different subgenres – Arizona Zeravas’ Roxanne and The Weeknd’s Blinding Lights. We can see that all of them were released in 2019 and all categorized under my two favorite two subgenres of R&B, Urban Contemporary and Hip Pop. All of them also boast a danceability score of above 0.5, with most of them (with the exception of Maroon 5’s Memories and Billie Eilish’s everything i wanted) having energy scores of above 0.5. We can also see that across the board, all 10 songs have low speechiness and instrumentalness scores (with the exception of Billie Eilish’s everything i wanted. Interestingly, all of the songs fall within a valence of 0.2-0.6. The other characteristics are quite varied. So, for the purposes of my analysis of the R&B genre, I will only focus on the song characteristics that have clear trends across the genre – danceabiility, energy, speechiness, instrumentalness, and valence.
In the exploratory phase of my analysis of the R&B genre, the most obvious characteristic of a song in the R&B genre was a song’s subgenre. Are certain genres more likely to have more popular songs because some have more fans and listeners than others? In the density plot below, we see that this is the case – Neo-Soul and New Jack Swing have the highest quantity of popular songs.
randb %>%
ggplot(aes(x = track_popularity, fill = playlist_subgenre)) +
geom_density(alpha = 0.1) +
theme_classic() +
labs(title = "Do certain subgenres have more popular songs?",
subtitle = "This density plot only includes songs with a popularity of >=75.\nIt seems that Neo-Soul and New Jack Swing have the most popular songs.\n\nR&B Subgenre: {closest_state}",
x = "Track Popularity",
y = "",
fill = "R&B subgenre",
caption = "Visualization created by Brian Lee") +
transition_states(playlist_subgenre, transition_length = 3, state_length = 1)
#get rid of axes, make subtitle descriptive
anim_save("randb_density.gif")
knitr::include_graphics("randb_density.gif")
In the density plot above, Neo-Soul and New Jack Swing both seem to have a lot of popular songs on the lower end of the spectrum (75-85), with Urban Contemporary and Hip Pop following similar trends, but in comparison to the other two genres, their density curves are not as large, signaling that the former two genres have more songs classified as “popular” than teh latter two.
I believe that this trend could be occurring because of the huge increase in the production of hip pop and urban contemporary music. With streaming services such as Spotify making it easier than ever for small creators to attain platforms and with the advancement of technology making it easier to produce and release music from one’s own bedroom, this may be because of the oversaturation of the music industry – there are more songs being released than ever.
randb %>%
group_by(playlist_subgenre) %>%
summarize(num_of_songs = n(), avg_pop = mean(track_popularity)) %>%
knitr::kable()
## `summarise()` ungrouping output (override with `.groups` argument)
| playlist_subgenre | num_of_songs | avg_pop |
|---|---|---|
| hip pop | 281 | 82.95018 |
| neo soul | 41 | 78.90244 |
| new jack swing | 4 | 77.50000 |
| urban contemporary | 204 | 82.13725 |
Despite the large density curves, on average, hip pop and urban contemporary are slightly more popular than the Neo-Soul and New Jack Swing. Another interesting observation we can make is the sheer lack of popular songs for Neo-Soul and New Jack Swing.
A quick Google search will reveal that both Neo-Soul and New Jack Swing were subgenres of R&B that were popular during the 1980’s/90’s. Their large density curves could be due to this fact. Because the technology for household high quality handheld microphones and producing equipment was not in abundance like it is now, artists had to rely on label companies and managers for the funding to acquire the money for studios and expensive equipment, thus leading to less music being produced. Additionally, because labeling agencies and managerial agencies essentially “invested” in discovered artists whom they knew they would get a high profit margin from, the discovered artists who were given a platform by these agencies were more likely to be successful. With a smaller pool of music and more popular songs making up that small poool of music, large density curves such as the ones we see in the visualization above for Neo-Soul and New Jack Swing are possible, and could serve as an explanation for the difference in the quantity between the four genres.
As I move forward in my analysis to look at the specific characteristics of popular R&B songs, I will restrict myself to the two subgenres with more cases to look at and my person two favorite subgenres – Hip-Pop and Urban Contemporary.
randb %>%
group_by(playlist_subgenre) %>%
filter(playlist_subgenre == c("hip pop", "urban contemporary")) %>%
summarise_at(c("track_popularity", "danceability", "energy", "speechiness", "instrumentalness", "valence"), mean, na.rm = TRUE) %>%
knitr::kable()
## Warning in playlist_subgenre == c("hip pop", "urban contemporary"): longer
## object length is not a multiple of shorter object length
## Warning in playlist_subgenre == c("hip pop", "urban contemporary"): longer
## object length is not a multiple of shorter object length
| playlist_subgenre | track_popularity | danceability | energy | speechiness | instrumentalness | valence |
|---|---|---|---|---|---|---|
| hip pop | 82.62411 | 0.6985887 | 0.6000979 | 0.1304929 | 0.0120022 | 0.4780922 |
| urban contemporary | 81.98039 | 0.6823333 | 0.5401578 | 0.1340971 | 0.0135849 | 0.4606735 |
# Add graph
As it becomes easier to produce and release music from one’s own bedroom and streaming platforms such as Apple Music and Spotify increasingly making music accessible to everyone, we believe our analysis has important implications which can help listeners find new songs that they like and help platforms build algorithms that give better and more relevant song recommendations to its users.
Of course, carrelation does not equal causation. Just because the
Thanks to streaming platforms such as Spotify and Apple Music, small creators are also given a platform for creative release. Our analyses of pop, rap, and R&B, can also help small artists grow their own platforms to cater to the interests of specific audiences. In a time such as now when the consumption of art (whether it be in the form of movies, music, or television), is essential to one’s mental wellbeing, our analysis can help boost these efforts. By asking the question, “What makes a song in a given genre popular?” We have taken a close look at the specific characteristics of songs with a popularity rating of 75 or higher.